An Empirical Study of the Impact of Idioms on Phrase Based Statistical Machine Translation of English to Brazilian-Portuguese
نویسندگان
چکیده
This paper describes an experiment to evaluate the impact of idioms on Statistical Machine Translation (SMT) process using the language pair English/BrazilianPortuguese. Our results show that on sentences containing idioms a standard SMT system achieves about half the BLEU score of the same system when applied to sentences that do not contain idioms. We also provide a short error analysis and outline our planned work to overcome this limitation.
منابع مشابه
Statistical Phrase-based Machine Translation: Experiments with Brazilian Portuguese
Statistical approaches have recently emerged as the main paradigm in Machine Translation (MT) research. In previous work we have shown that results of a simple statistical word-based MT system may be highly comparable to those produced by a rule-based approach for closely-related languages such as Brazilian Portuguese and European Spanish. In this work we take the discussion one step further an...
متن کاملEvaluation of a Substitution Method for Idiom Transformation in Statistical Machine Translation
We evaluate a substitution based technique for improving Statistical Machine Translation performance on idiomatic multiword expressions. The method operates by performing substitution on the original idiom with its literal meaning before translation, with a second substitution step replacing literal meanings with idioms following translation. We detail our approach, outline our implementation a...
متن کاملA Comparative Study of English-Persian Translation of Neural Google Translation
Many studies abroad have focused on neural machine translation and almost all concluded that this method was much closer to humanistic translation than machine translation. Therefore, this paper aimed at investigating whether neural machine translation was more acceptable in English-Persian translation in comparison with machine translation. Hence, two types of text were chosen to be translated...
متن کاملSegmentation Strategies to Face Morphology Challenges in Brazilian-Portuguese/English Statistical Machine Translation and Its Integration in Cross-Language Information Retrieval
The use of morphology is particularly interesting in the context of statistical machine translation in order to reduce data sparseness and compensate any lack of training corpus. In this work, we propose several approaches to introduce morphology knowledge into a standard phrase-based machine translation system. We provide word segmentation using two different tools (COGROO and MORFESSOR) which...
متن کاملFactored Translation between Brazilian Portuguese and English
Factored translation is an extension of the state-of-theart phrase-based statistical machine translation (PB-SMT). The main difference in factored translation approach is that a word is not only a token (its surface form) but a vector composed of different information such as lemma, part-of-speech or morphologic/syntactic tags. In this paper we present some experiments carried out to train and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014